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Abstract 
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This paper presents a new Bayesian algorithm for retrieving surface rain rate from Tropical 
Rainfall Measurements Mission (TRMM) Microwave Imager (TMI) over the ocean, along with 
validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach 
offers a rigorous basis for optimally combining multichannel observations with prior knowledge. 
While other rain rate algorithms have been published that are based at least partly on Bayesian 
reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes' 
Theorem to yield not just a single rain rate, but rather a continuous posterior probability 
distribution of rain rate. 

To advance our understanding of theoretical benefits of the Bayesian approach, we have 
conducted sensitivity analyses based on two synthetic datasets for which the “true” conditional 
and prior distribution are known. Results demonstrate that even when the prior and conditional 
likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not 
the result of a defect of the Bayesian formalism but rather represents the expected outcome when 
the physical constraint imposed by the radiometric observations is weak, due to saturation 
effects. It is also suggested that the choice of the estimators and the prior information are both 
crucial to the retrieval. In addition, the performance of our Bayesian algorithm is found to be 
comparable to that of other benchmark algorithms in real-world applications, while having the 
additional advantage of providing a complete continuous posterior probability distribution of 
surface rain rate. 
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1. Introduction 


V 


Satellite passive microwave observations are now widely used to estimate global surface 
rainfall (Adler et al. 2001). Inference of surface rainfall R from microwave brightness 
temperatures T B would be less troublesome if the relationship between these two variables were 
unique and reasonably linear. Unfortunately, not only is the relationship nonmonotonic owing to 
the competing effects of scattering and emission (Petty 1994a), but a variety of microphysical 
and environmental factors introduces significant variability into the relationship. This ambiguity 
is compounded by spatial variability in the rain intensity when the strongly nonlinear function of 
local rain rate is averaged over a finite instrument field-of-view (FOV) of order 10 km or larger 
(Wilheit 1986; Petty 1994a; Petty 1994b). As a result, the rainfall retrieval problem requires not 
only a suitable physical model but also a proper accounting for the statistical variability in the 
relationship between FOV-averaged rain rate and FOV-averaged microwave observables. 

Bayes' Theorem states that for a given observation vector P (e.g., multichannel 
microwave observations), the posterior probability distribution of the parameter R to be 
estimated (e.g., rain rate) is proportional to the conditional likelihood times the prior distribution: 

7r(/?IP)°c/(PI/?)-7r(/?) (1) 

where y(P|i?) is a conditional probability density function (PDF) that specifies the probability 
distribution of the observation P given parameter R. Since this distribution incorporates 
information concerning the physical response of the observations P to the parameter R (as well 
as the statistical variability in that response), we will refer to this part as the physical model. 
n(R) summarizes our prior knowledge of the parameter before data are seen. The interaction of 
the physical model and the prior (e.g., climatological) probability distribution of R determines 
the so-called posterior distribution fl(^|P); i.e., the new PDF of R in light of the observations P. 
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Normally, the effect of P is to reduce the spread of /t(./?|P) relative to 7i(R); the degree of 


.A 




reduction is a measure of the information content of P. 

Bayes' theorem is not an algorithm per se, but it offers a rigorous and completely general 
theoretical framework for retrieving atmospheric variables from remote sensing measurements. 
The major practical obstacle to its routine application is the need to specify the prior distribution 
n(R) and conditional distribution/(P|R). When the parameter R is a scalar, the specification of 

the first of these poses no major difficulty. It is more difficult when the parameter to be retrieved 
is an iV-dimensional vector (e.g., a hydrometeor profile), since the prior distribution is then a 
multivariate function that must typically be estimated from a very large ensemble of model 
simulations. For example, a number of papers have explicitly invoked Bayes' theorem in the 
design of an algorithm to infer rain rate and other rain cloud properties from multichannel 
microwave radiances (Evans et al. 1995; Kummerow et al. 1996, 2001; Olson et al. 1996, 2005; 
Bauer et al. 2001; Marzano et al. 2002; Tassa et al. 2003; Michele et al. 2005). These typically 
attempt to retrieve a single “best” vertical hydrometeor profile and an associated surface rain 
rate. 

Regardless of whether R is a scalar or a vector, the accurate specification of^(P|i?) can 
pose a major challenge, as it depends on accurate modeling of both the physical and statistical 
properties of the mapping from R to the observation vector P. The higher the dimensionality of 
both, the more difficult it is to obtain a sufficiently large and varied sample of either observations 
or simulations to accurately represent to conditional distribution. 

Partly for this reason, we are unaware of any previous algorithm that fully exploits Bayes' 
theorem to explicitly provide a continuous posterior PDF of rain rate. Rather, the above 
algorithms are generally formulated so as to yield a single surface rain rate for each sensor field- 
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of-view (FOV). In the Goddard Profiling algorithm (GPROF) (Kummerow et al. 1996; Olson et 
al. 1996), the Bayes method is essentially reduced to a table lookup. Consequently, retrievals 
from GPROF are determined only by how close the observation vector is to the nearest 
candidates of the database, and by how often similar cloud profiles occur in the database. 
Furthermore, existing Bayesian retrieval algorithms depend on the representativeness of 
independent cloud-radiative databases and on assumptions about the form of conditional and 
prior likelihoods (e.g., Gaussian properties), even though training data are known to disagree 
with those assumptions to at least some degree. 

Here we describe a new Bayesian algorithm for rain rate retrieval over the ocean using 
dual-polarization passive microwave images from the conically-scanning Tropical Rainfall 
Measuring Mission (TRMM) Microwave Imager (TMI). Unlike previous algorithms, this one is 
based on explicit numerical evaluation of (1). We are able to do this in part because we limit our 
attention to a low-dimensional problem - that of estimating a single scalar FOV-averaged rain 

rate R from only three microwave observables, the attenuation index P (defined below) at 10.65, 
19.35, and 37 GHz. 

This new algorithm has two unique characteristics. Unlike other Bayesian methods, ours 
is based on explicit mathematical models of the conditional likelihood /(Pj/?) and the prior 
distribution MR), based on empirical fits to data derived from both simulations and actual 

observations. Most importantly, unlike other methods, the result of our method is not a single 
“best” rain rate but rather a complete posterior probability distribution. A major shortcoming of 
existing rain rate retrieval methods has been the lack of a basis for systematically and rigorously 
characterizing uncertainty. Particularly in view of the highly skewed distribution of rain rates 
and of the non-linear response of microwave radiometers to surface rainfall, the quantification of 
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uncertainty is complicated by the likelihood that errors will be non-Gaussian. The lack of 
quantitative error information in turn hampers the optimal assimilation of rain rate estimates into 
models. Efforts have been made in quantifying inherent uncertainty of retrieved rain rates in 
GPROF (Olson et al. 2005) and the Bayesian Algorithm for Microwave-based Precipitation 
Retrieval (BAMPR) (Tassa et al. 2003; Michele et al. 2005) from selected cloud profiles of 
databases. However, the algorithm described in this paper shows that it is possible to directly 
applies Bayes' Theorem so that, given an observation vector from the TMI, one may obtain a 
complete posterior probability density function (PDF) of rain rate rather than merely a single 
“best” estimate of rain rate at that location. 

This paper focuses on (a) the algorithm definition, (b) key aspects of the performance of 
the Bayesian methodology under controlled conditions; and (c) applications to TMI data. There 
are several components to (b). The first pertains to the performance of a Bayesian rain rate 
algorithm under ideal conditions - i.e., when bothy(P|^) and 7i(R ) are specified perfectly. We 
will show that, even under such ideal conditions, systematic biases in retrieved intensity are a 
natural result when the physical information in the observations is insufficient to strongly 
constrain the results (e.g., in high rain rates, where saturation of microwave radiances tends to 
occur). 

Since the “true”y(P|^) and n(R ) are normally only approximately known for real-world 

retrievals, the second component of our sensitivity analysis pertains to the influence of errors in 
the specification of these functions on the quality of the retrievals. Evans et al. (1995), using 
simulated data, evaluated which covariates had the greatest influence on the accuracy of 
retrievals. Nevertheless, their results were subject also to imperfections in the cloud model and 
radiative transfer model, as well as to simplifying assumptions built into both their conditional 
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and prior distributions. Unlike their work, we describe the results of sensitivity tests using two 
synthetic data sets where the “true” conditional and prior distributions are perfectly known. We 
are therefore able to examine the intrinsic limits in the retrievability of rain rate (based on our 
observables) without regard to possible errors in the models themselves. 

There is the important question of how to interpret the resulting posterior rain rate PDF in 
terms of a single “best” rain rate for any given sensor FOV. We will show that, owing to the 
highly skewed properties of the posterior PDF, the choice of estimator (e.g., maximum 
likelihood vs. minimum variance or expectation value) is of critical importance and needs to be 
selected carefully with the particular application in mind. Note that the former was used in 
Evans et al. (1995) and referred to as the Maximum A posteriori Probability (MAP) in Marzano 
et al. (1999), while the latter was used in GPROF and BAMPR and referred to as minimum mean 
square (MMS) criterion in Tassa et al. (2003) and Michele et al. (2005). 

Applications to TMI data include implements of our new Bayesian rain rate retrieval 
algorithm and validations against estimates from TRMM Precipitation Radar (PR). PR is an 
active microwave sensor that scans in a cross-track strategy from nadir to 17°. The swath width 

is about 215 km and the minimum detectable threshold of PR reflectivity is about 17 dBZ in the 
absence of attenuation. The horizontal resolution is about 4.3 km at nadir, while the vertical 
resolution is 0.25 km. In addition, we compare performance of our Bayesian algorithm to that of 
other benchmark algorithms. The algorithm may be judged successful if its overall performance 
at estimating surface rain rate (based on an appropriate estimator applied to the posterior PDF, 
such as the expectation value) is not significantly worse that that of other algorithms while also 
providing the detailed error information that other algorithms lack. 
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In the next section, we begin by reviewing the definition of the normalized polarization 
(or attenuation index) P (Petty 1994a). Because of certain desirable properties, this linear 
transformation of dual-polarization brightness temperatures serves as the fundamental 
microwave “observable” in our algorithm. In section 3, we introduce highly simplified but 
reasonably realistic analytic models for the local dependence of P on local rain rate at each 
frequency. The statistical variability of this relationship due to spatial averaging over 
realistically non-uniform rainfall is then evaluated by applying these relationships to a large set 
of radar-derived rain rate fields. We show that the statistical distribution of our simulated P in 3- 
space is similar to that observed in actual observations of oceanic rainfall by the TMI. Section 4 
describes the construction of the prior and conditional distribution functions required by the 
Bayesian. The design and results of the sensitivity tests are discussed in Section 5. Section 6 
presents real-world applications and validation of this Bayesian algorithm, followed by summary 
and discussions. 

2. TMI Attenuation Indices P 

a. Instrument description 

The TMI measures dual polarized brightness temperatures at 10.65, 19.35, 37.00, and 
85.50 GHz, and at 21.3 GHz with vertical polarization only. Detailed descriptions of other TMI 
characteristics can be found in Kummerow et al. (1998) and Bauer and Bennartz (1998). For 
convenience, we henceforth denote vertically polarized brightness temperatures at TMI channels 
as Tiov, Ti 9 v, Tj iv, Tjw, and T%w, and replace V with H for horizontal polarization. 

As discussed by Petty (1994a), individual channel brightness temperatures have 
significant shortcomings as the primary observables in a physically based rain rate retrieval 
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algorithm. Each channel, regardless of whether it is horizontally or vertically polarized, will 
respond to a variety of environmental variables, as well as to a mixture of both emission and 
scattering from the rain cloud itself. Petty therefore proposed the use of linear transformations of 
dual-polarization brightness temperatures 7Vand Th at any given frequency. These effectively 
decouple the emission (attenuation) and scattering contributions into two separate variables, P 
and S, as well as factoring out background variability due to variations in atmospheric water 
vapor, surface roughness, etc. 


b. Definition 

We limit our attention here to the attenuation index (or normalized polarization) P, which 
is defined as 


T -T 

p _ jv _ H 


T -T 

1 V,0 1 H.O 


( 2 ) 


where TV and T H are the vertically and horizontally polarized brightness temperatures; 7V,o and 
Tito are the estimated or modeled brightness temperatures in the absence of rain or cloud. 
Ideally, the value of P therefore falls in the range [0,1], where 1 corresponds to a cloud- free 
pixel, and values approaching 0 represent a very opaque atmospheric condition generally 
associated with heavy precipitation. 

The use of the attenuation index P in the retrieval algorithm has three advantages. First, 
unlike brightness temperature, it decreases monotonically with increasing rainfall intensity. 
Second, the attenuation index is not sensitive to the background variability because P is mainly 
determined by rain cloud optical thickness. Third, for the special case of a horizontally 
homogeneous rain layer, P and transmittance t obey an approximate power-law relationship, 

P = t a , with a = 1.7. Therefore, in this limiting case, P index yields a direct indication of the 
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rain cloud transmittance. In this paper, our microwave observables are the attenuation indices at 
10.65, 19.35, and 37.0 GHz. Our observation vector P is therefore given by (Pi 0 , P\% P 37 ) • 

c. Implementation for TMI 

In order to convert satellite-observed brightness temperatures 7V and 7 h at a given 
frequency to attenuation index P, we require estimates of the cloud-free background brightness 
temperatures 7V,o and 7h,o- To a good approximation, these are functions of total column water 
vapor V and surface wind speed U. Both of these variables may be directly estimated from the 
passive microwave observations themselves, provided that care is taken to exclude FOVs 
contaminated by rain or land. The retrieved fields of V and U are then spatially interpolated into 
areas of precipitation and used as the basis for estimating 7V, o and 7h,o- It is not essential that 
these estimates be very precise, only that they account for most of the variation in the 
background polarization difference 7V,o - 7h,o- Uncertainties in the estimate of this quantity are 
treated as part of the inherent observational error in P. 

We derived our own empirical algorithms for V and U from match-ups between TMI 
radiances and surface observations. For column water vapor, we matched radiosonde 
observations to TMI overpasses in January and July 1999. The match-up procedure was similar 
to Alishouse et al. (1990) and Petty (1994b). Further details are given in Chiu (2003). The 
resulting statistical water vapor algorithm is given by 

V [kg m' 2 ] = 128.57 + 33.941n(290- T l9V ) 

-72. 1 3 ln(290 - T 1W ) + 10.48 ln(290 -T„ H ). (3) 

Match-ups between the TMI measurements and surface buoy wind data from NOAA Marine 
Environmental Buoy database (Chiu 2003) yielded the following algorithm for wind speed: 
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( 4 ) 


U [m s'] = 1 30.908 + 0.1 10T m +0A2ST {0H -0.0347J 9V 

-0. 1 157; 9W - 0.079 T 2W - 1 . 121 T 37V + 0.543 T 31H . 

Empirical expressions for the background brightness temperatures were derived from 489 TMI 
orbits in July 1 999: 

T lovo = 1 54. 1 + 0.076V + 0.2417 + 0.477; 

T 10 H,o = 73.8 + 0. 14V + 0.90 U + 0.24 T s 
ln(300- T l9V O ) = 4.89 - 0.0072V -0.00171/ -0.00257; 

I n(300 -T i9HO ) = 5.39- 0.0078V - 0.00631/ - 0.000527; 
ln(300 - T 37VO ) = 4.65 - 0.0058V + 0.000551/ - 0.000697; 

ln(300 - T 31HO ) = 5.22 - 0.0065V - 0.00801/ + 0.0003 17;, (5) 

where V and U were estimated by (3) and (4), and sea surface temperature Ts (°C) was obtained 
from a climatological database (obtained from NASA/GSFC). 

3. Radar-radiative simulations 

a. Simulated P for realistic rain fields 

A key part of our algorithm is the specification of /(PI/?), which depends on both the 
physical and statistical properties of rainfall, especially the variable effects of beam-filling, 
which we take to be the single most factor determining the normalized polarization (this is not 
true for the single-channel brightness temperatures). Since actual match-up data from 
microwave and rainfall measurements or detailed model simulations do not exist in a sufficient 
quantity, we rely on simulated data derived from high-resolution radar composites. 
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1) RADAR DATA 

As previously mentioned, beam-filling errors due to inhomogeneities of rain clouds result 
in a non-unique relationship between rain rate and microwave signal, and these contribute to 
significant ambiguities in passive microwave retrievals. In order to account for rain cloud 
inhomogeneity in our specification of /( P I R), we used NWS WSR-88D radar operational 
reflectivity data to obtain spatially realistically rain rate fields for use in simple radiative transfer 
simulations designed to capture the contribution of horizontal inhomogeneity to fluctuations in 
the P-R relationship for any given frequency. 

The reflectivity data were operational gridded composites of 154 NWS WSR-88D sites in 
the US with an hourly temporal resolution and a spatial resolution of 1 km. Reflectivity values 
were converted to rain rate using the Marshall and Palmer (1948) relationship Z = 200R 16 . (This 
relationship is intended only to yield statistically reasonable spatial patterns of rain intensity for 
use in the simulations, not absolutely calibrated rain rate estimates.) A total of 22 radar 
reflectivity files were randomly selected during July and August 2002, each comprising an 
average of 3x1 0 2 * 4 rainy pixels. Because of the coarseness of the digitization of reflectivity Z, 

small upward and downward shifts in Z were applied to each image in order to yield a reasonably 
smooth combined histogram of modeled rain rates. This procedure effectively multiplied the 
number of radar grids to 110. The reader is referred to Chiu (2003) for additional details. 

2) Polarization calculations 

Starting with the precipitation structures simulated from the radar measurements, we 

simulated satellite-observed polarization P via a simplified plane-parallel radiative transfer 
model. The mass extinction coefficient of suspended cloud water tQj is listed in Table 1, which 
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where the simulated brightness temperature T B at polarization p (vertical or horizontal) is 
determined by the transmittance t, the specular emissivity of the surface (sea water in the study) 
e p , the surface temperature 7s, and the air temperature Ta- The transmittance t is given by 


t = exp 


cos 6 


( 10 ) 


where 0 is the incidence angle and x is the total optical depth due to hydrometeors. The surface 
temperature was approximated from the extrapolation of the temperature profile, assuming a 
temperature at the freezing level of 0°C and a lapse rate of 6.5 K/km. 7 a was estimated by the 


air temperature at the mid point between the surface and Zj: Since the emissivity of the ocean is 
polarized, both the vertically and horizontally polarized radiance was obtained. 

We define t a , ti, and t r as the transmittance attributed to the atmosphere, the suspended 
cloud water, and the rain water, respectively, and t\ as the total transmittance (the product t a , U, 
and t r ). Based on (2) and (9), it can be shown that the attenuation index (normalized 
polarization) can be written as 


p 'JZszIihliL 

'.fo- 7 Vo) + 'X>’ 


(ii) 


where T A ,o represents the air temperature when the suspended cloud water and rain water are 
absent. We assume 7 a = T A ,o in the simulations. In addition, the order of the first term for both 

the numerator and denominator is much smaller than the second term. Therefore, the attenuation 
index can be approximated as 



2 


h 


Wr 



U J 


t 2 t 2 

*/ 'r > 


( 12 ) 
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was computed from Liebe et al. (1991) assuming a temperature of 0°C. The relationship 
between the volume extinction coefficients of rain water k e r and rain rates (R) was estimated 
from Mie theory, assuming spherical rain drops with a liquid water temperature of 10°C and 

Marshall-Palmer drop size distribution. A power-law form was found to approximate the 
relationship well (Petty 1 994b), 

K, = aR\ (6) 

where coefficients a and b for each channel are shown in Table 1. The rain layer optical depth x r 
was then modeled as 

T,£ Z,kjR), (7) 

representing the contribution of suspended rain water to attenuation of the polarized ocean 
surface emission. Since there was no information about the values of the freezing height Z/ in 
the radar reflectivity product, a fixed value of 3 km was assumed for the purposes of this 
demonstration. The optical depth T/ attributed to suspended cloud water was modeled as 

T,=K el L, ( 8 ) 

where variations in the suspended cloud water content ( L in kg/m 2 ) were simulated via a 
lognormal random deviate applied to each grid cell. 

Because depolarization of the ocean surface emission by rain clouds depends primarily 
on total path attenuation and not on the details of the vertical structure of temperature or 
hydrometeor properties, we may use a highly simplified 1-D plane-parallel model to compute 
brightness temperatures at each grid point. The radiative transfer equation is written as 

T b . p = (1 " t)T A + e p tT s + (1 - f)( 1 - £ p )tT A , (9) 
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Three important characteristics of P are shown in Fig. 1. First, theoretically, in a 
homogeneous case of rain cloud, a unique non-linear relationship in P is expected. However, in 
reality, due mainly to variable beam-filling, P exhibits considerable scatter in 3-D space. Both 
observations and simulations show similar magnitudes of this effect. Second, P may sometimes 
be slightly greater than 1 in cloud-free cases (but smaller than 1 . 1 in most cases), due to a 
combination of instrument noise, errors in water vapor and surface wind speeds, and errors in 
regression equations. Our analytic models for the conditional distribution will account for this 
effect. Third, polarization differences at 10.65 GHz of less than 30 K are exceedingly rare in the 
TM1 data set; correspondingly, very few observed values of P\q are less than 0.4. This 
observation highlights the fact that rainfall is almost never both horizontally uniform and heavy 
enough to saturate the 10.65 GHz channel. 

4. Algorithm basis 

As stated in Bayes' theorem, the Bayesian posterior density function is determined by 
conditional likelihoods that statistically describe physical relationships between rain rate and 
microwave signal, and a prior rain rate distribution that represents our knowledge. Therefore, 
there are three key elements in our algorithm: the conditional likelihood, the prior distribution, 
and the estimator interpreting the posterior distribution. In this section, we introduce generic 
forms to model the prior and conditional distributions. These forms fitting to radar-radiative 
model simulations will be a basis for further sensitivity tests. 

a. Conditional probability density function 
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Note that t a cancels out in the calculation of P, and thus the absolute value of t u has little effect 
on P. Therefore, cloud-free brightness temperatures can be easily approximated employing a 
transmittance of unity in (9), in which case T B O = eT s . 

Once the high-resolution field of P had been computed for each simulated rain rate field, 
both the rain rate field and the P field were spatially averaged. The rain rate field was averaged 
using a moving window of 15x15 km, representing the nominal retrieval resolution of the 

algorithm. The simulated P values for each TMI channel were spatially averaged using a 
Gaussian approximation to the effective field-of-view for that channel. Gaussian random noise 
with standard deviations of 0.01, 0.02, and 0.02 for 10.65, 19.35, and 37.00 GHz channel, 
respectively, was then added to P in order to reasonably account for errors due to the highly 
simplified nature of the forward model, instrument noise, and similar variables. The above 
procedure yielded a large ensemble of matched FOV-averaged rain rates R and attenuation 
indices Pio, P\ 9 , and P 37 . 


3) Comparisons with tmi-derived attenuation index 

To evaluate the statistical representativeness of our simulated P, we compared the 3-D 
PDF of P derived from the simulations with that obtained from 110 actual TMI orbits during 
1999 and 2000 were obtained. Fig. 1 compares 2-D slices from the 3-D distributions for 
simulated and actual data. Despite the simplicity of the forward model and the utilization of 
continental radar observations as a proxy for the spatial structure of oceanic rainfall, there is 
surprisingly strongly qualitative similarity between the two distributions. We can therefore have 
some confidence in the utility of our simulations for constructing our Bayesian algorithm and for 
conducting further sensitivity tests. 
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The conditional likelihood f(P\R ) is a multivariate probability distribution. We used 
two methods to characterize the conditional PDF. The first method is to use a covariance matrix 
to include linear relationships only between rain rate and microwave signature, which is similar 
to the parameterization of current available Bayesian algorithms. However, as shown in 
observations and simulations (Fig. 1), linear properties are not sufficient to approximate the 
multichannel dependency of microwave signature. Therefore, in the second method, we took a 
successive approach to describe the conditional PDF that included linear as well as non-linear 
relationships. We will refer to these as "the linear model" and "the nonlinear model", 
respectively. 

1) The linear model 

Similar to a multivariate Gaussian distribution, the conditional PDF between P and R can 
be written as 


/( P 1 R ) ~ P \9) P 3l( a ~ P n) eX P 




(13) 


This closed-form function is essentially a normal distribution in the interior of the interval [0, a\ 
but is forced to zero at the boundaries by the term of P(a- P ) for each frequency, a is chosen to 
be 1.1, based on observed ranges of actual TMI-derived values. C is the covariance matrix, and 
Jut is 


\l = 0 u „ a * 2 ^ 3 ); 

(14) 

P := a tl , eX p[~ h H, R ] + C K.-’ 

(15) 


where the subscripts 1, 2, and 3 describe the quantities at 10.65, 19.35, and 37.00 GHz, 
respectively. Subindex i is the corresponding channel. Based on the radar-radiative simulations, 
there parameters can be approximated by Table 2 and 
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( 16 ) 


0.010 0.015 0.020 
C= 0.015 0.040 0.045 
0.020 0.045 0.060 

2) The nonlinear model 
The more complete physical model is developed to account for the linear as well as non- 
linear relations of P to rain rate. The distribution of P at a given rain rate R is approached 
hierarchically: 

/(P I R) = f(P , 7 1 R)f(P n I P„,R)f{P l0 1 P„.P„,R), (17) 

where /(P 37 1 R ) is the likelihood of P37 at a given R; /(P 19 I P 31 ,R ) is the PDF of P19 when P37 
and R are fixed; and the f{P w I P i9 ,P 37 ,R) describes the PDF of P10 while the other three variables 
are known. These conditional PDFs, for example, can be approximated as 

P 3 ,(o-P 5 ,)exp ;/>3 ,e[ 0,4 (18) 

/(P l9 I P 37 ,R) x P x9 [a — /^ 9 )exp — (P l9 — P 2 ) (19) 

where JJ.2, JH3, 02 and 03 are all determined by fitting the same radar-radiative model simulations. 
Complete parameterizations can be found in Chiu (2003). 

b. The prior distribution of rain rate 

Surface rain rate distributions have been commonly parameterized by lognormal 
functions (Houze and Cheng 1977; Kedem and Chiu 1987; Kedem et al. 1990, 1997; Sauvageot 
1994; Nzeukou and Sauvageot 2002), although some observations showed departures from the 
lognormal distribution (Jameson and Kostinski 1999) and some suggested that rain rates 
followed gamma distributions (Ison et al. 1971; Swift and Schreuder 1981; Wilks and Eggleston 
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1992). Since Cho et al. (2004) found that both lognormal and gamma distributions were able to 
characterize the PDF of rain rates from TRMM data, we used lognormal functions as the prior 
distribution. 

The lognormal density function for the prior distribution is denoted as log7V(//,a ) and 
defined as 


log N(r I jU,cr) = 


Ro^Zk 


0,7? -0 


exp 


2a 


-(In R-pf) 


,R> 0 


( 20 ) 


where p and a are the mean and standard deviation (mm/hr) of the variable. These two 
parameters vary with different rain rate observation datasets. Therefore, we will evaluate the 
effect of varying (p, a) on the simulated retrievals. 


c. Estimators of the posterior distribution 

When the conditional and prior likelihoods are specified, based on the Bayes' theorem 
((1)), the posterior distribution can be derived by 

, /WL 
J /(P I R)n(R)dR 

Integrations with respect to rain rate in the four-dimensional space were performed numerically. 
Once the posterior distribution is known, the two most common estimators were taken: the mean 
and the maximum likelihood estimate of the posterior probability distribution, denoted as MEAN 
and MLE, respectively. The corresponding Bayesian estimates are stored in a 3-dimensional 
lookup table for each P for single-pixel retrievals. 
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5. Sensitivity tests 

a. Experiment design 

A number of experiments are designed for the sensitivity test by various combinations of 
the prior and conditional likelihoods (summarized in table 3). The retrieval target was produced 
from a random number generator that followed all conditional PDFs of the nonlinear model 
(( 1 7)-( 1 9)). Since one of our purposes for the sensitivity tests is to better understand the 

behavior of retrieval at higher rain rates, we generated this dataset along with the prior 
distribution logA(0, 2) to prevent a scarcity of high rain intensity. 

Control run (RO) is the experiment that the Bayesian retrieval method uses exactly same 
conditional and prior PDFs with the retrieval target. An analysis of this control experiment can 
provide insight into the inherent uncertainty of the retrieval, since each PDF is perfect and no 
assumption is made in the algorithm. Furthermore, by comparing the control run with other 
experiments, the sensitivity of the algorithm to the prior distributions of rain rate can be 
evaluated by experiments R1 and R2, while R3 and R4 aim to understand the sensitivity to the 
conditional likelihoods. In experiment R3, we used a different parameterization for 
f(P i9 1 P 31 ,R) to investigate the sensitivity of the algorithm to various explicit functions. 
Experiment R4 applies the P -R relationships of the linear model (i.e., covariance matrix form) 

into the algorithm. This experiment is vital because it evaluates the adequacy of simple 
assumptions of the conditional PDFs in the retrieval algorithm, when in fact the dataset has a 
much larger degree of complexities. 

b. Intrinsic uncertainty of the algorithm 
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As we have emphasized through this paper, a key property of the Bayesian algorithm is 
its ability to obtain complete posterior PDFs of retrieved rain rates. Based on the complete 
posterior distribution, a probability statement can be made about the realization of the retrieval. 
This advantage allows us to assess intrinsic uncertainties of the retrieval, when the ideal physical 
model and exact prior distribution are applied to the algorithm. 

Figure 2 shows examples of the posterior distributions from experiment RO. Some 
posterior distributions have a single maximum over the entire rain rate range, but some have two. 
A single maximum expresses the situation that a given observation vector P provides 
unambiguous information about the most likely rain rate when both physical relationships and 
prior knowledge are taken into account. A bimodal distribution implies that two distinct rain 
intensities are of comparable likelihood. The peak on the left indicates a scene of widespread 
stratiform precipitation associated with a smaller rain rate, while the second peak suggests the 
possibility of a strong convective cell within that 15x1 5 km area. Based on the posterior PDFs 

of Fig. 2b, the MLE retrieval can increase from 5 to 60 mm/hr if F t0 decreases from 0.64 to 0.60 
and P ] 9 and P 37 remain the same. However, in reality, this magnitude of the change in P value 
may not relate to a dramatic change of rain intensity of the scene, but rather arises from the 
variations of the instrument noise, atmospheric condition, or/and the calculations of brightness 
temperatures. Therefore, caution should be exercised in these cases when single-pixel retrievals 
are of interest. Such cases may be easily detected via their larger associated standard deviations. 

Performance of the Bayesian algorithm at different rain rate ranges for RO experiment is 
demonstrated from the histogram of retrieved rain rates (as shown in Fig. 3). The title for each 
subplot is made of three components. The first component, RR, indicates the specific range of 
rain rates, where the retrieved rain intensity is drawn if its corresponding true rain rate in the 
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training dataset is in this range. The second component is the sample size of the histogram. The 
third component, MEAN or MLE, describes the estimate used to interpret the posterior 
distribution. The pair of the numbers on the upper-right comer in each subplot are the mean and 
standard deviation of the histogram, while the percentage is the proportion that the retrieved rain 
rates are in the same range as the actual rainfall intensities with respect to the whole histogram. 
Note that for the true range RR lower than 30 mm/hr, the histogram is plotted in a logarithmic 
scale. 

Results from R0 experiment using MEAN estimates suggest that the Bayesian algorithm 
is able to retrieve the light rain rate very well. For the moderate intensity (7-15 mm/hr), the 

retrieval algorithm captures 46% data points, and the mean value is about right. The algorithm 

tends to underestimate the rain rates when the actual intensity in training data increases to a 
heavier range. Meanwhile, the associated standard deviation of the histogram starts to increase 
as well. Even so, the retrievals still encompass 30% data in the true range. For the extremely 

large rain rate (greater than 75 mm/hr), there are 42% of data in the correct rain range, and the 

mode of the histogram falls within the true range. In this control experiment, the Bayesian 
algorithm shows the ability to retrieve rain intensity over all ranges, even for the case with 
extremely heavy precipitation. 

The histogram of MLE retrievals for experiment R0 is shown in Fig. 4. Results from the 
light rain regime demonstrate a satisfactory performance. However, the underestimation for 
intermediate intensities (4 to 30 mm/hr) is significant. Moreover, there are two distinct modes in 
the histogram when the true range RR goes up greater than 30 mm/hr. The one associated with a 
lower rainfall rate dominates in the RR range of [30, 75] mm/hr, and causes some retrievals at 
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least 20 mm/hr smaller than the true rain intensity. The other mode becomes dominant under the 
extremely heavy precipitation, yielding 40% data points in the correct range. 

In short, R0 experiment has demonstrated the retrieval ability of the Bayesian algorithm 
over various rain rate ranges when the prior and conditional likelihoods are both idealized. The 
single-pixel retrieved rain rates might be associated with a bias due to the inherent uncertainty in 
the physical relationships and the interpretation using MEAN and MLE. The inherent 
uncertainty might be reduced via the inclusion of additional information concerning other 
atmospheric or microwave variables. 

c. Sensitivity to the prior knowledge 

Analyses of experiments R1 and R2 demonstrate how retrieved rain rates change to 
various specifications of the prior distribution when the physical model remains the same. 
Compared to the control prior density function, the prior distribution in R1 experiment, 
log/V(0,l), has relatively smaller probabilities at very light rain rates and beyond 5 mm/hr. This 
property of the prior PDF leads the algorithm to retrieve reasonably for the true intensities 
between 0.2 and 7 mm/hr from both MEAN and MLE estimates. However, the smaller 
probability of the prior PDF at higher rain rates obviously limits the ability of the algorithm to 
retrieve heavy precipitation as shown in Fig. 5. Note that we also conducted other experiments 
changing p and a by 10% in prior PDFs, and found that slight variations of the prior distribution 
had no significant effects on the Bayesian algorithm. 

A non-informative prior distribution is used in experiment R2, which assigns equal 
weight to all values over the parameter space. Based on the Bayes' theorem, it is clear that the 
posterior probability density is now proportional to the likelihood represented only by the data in 
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this experiment, and thus the retrieval is dominated by the behavior of the physical model. 
Results demonstrate that a uniform prior PDF yields much higher retrieved rain rates by 7-15 
mm/hr in light and moderate rain situations (not shown). For the true range with heavier rain 
rates, experiment R2 captures around 50% data points in the correct range. Since experiment R2 
only includes the information of data with no prior knowledge, it is suggested that the P we used 
is sufficient to reflect the rainfall signal even in heavy precipitating systems. 

In summary, the prior rain rate distribution plays a crucial role in the Bayesian algorithm 
retrieval, since the characteristics of the prior PDFs determine the retrieval ability and bias at 
different rain rate ranges. However, the Bayesian retrieval is not sensitive to slight fluctuations 
of the parameters used in the prior PDFs. Therefore, when the prior rain rate distribution is 
applied reasonably, the Bayesian algorithm is still robust. 

d. Sensitivity to the conditional distribution 

This section advances our understanding of the sensitivity of Bayesian retrievals to the 
characterization of conditional likelihoods that provide statistical and physical relationships 
between P and R. First, we attempted to understand whether different explicit functions 
characterizing conditional distributions would have a substantial impact on retrievals. In 
experiments R0 and R3, the conditional PDFs were both derived from the same radar-radiative 
simulations, and they presented very similar relationships between P and R. As a result, 
experiment R3 yields very consistent retrievals with R0 experiment for both MEAN and MLE 
estimates (not shown). It indicates that the specifications of the conditional distributions are not 
critical if the P—R relationships represented by those PDFs are not far from reality. 
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Second, we attempted to evaluate the adequacy of using simplified conditional PDFs to 
represent more complicated behaviors of data by experiment R4. Fig. 6 depicts the microwave 
multichannel relationships represented in the experiment RO (upper panel) and R4 (bottom 
panel). These relationships demonstrate similar patterns, including the orientation and spread- 
out of the contours. However apparently, the locations of the maximum likelihoods are shifted 
to lower P values in experiment R4. 

Histograms of retrievals from the MEAN estimates of experiment R4 are shown in Fig. 7 
(results of MLE are not shown due to similar responses). In general, for both MEAN and MLE 
estimates, the use of a linear covariance matrix to describe conditional likelihoods significantly 
overestimates rain rates, and the variations of the retrieval histograms are almost twice than those 
of the control experiment RO. More specifically, for very light rain rates, R4 only produces 0. 1 % 

data points in correct ranges, while the control run successfully retrieves almost half of data. The 
positive bias becomes more significant when true rain rates are with moderate intensity. Under 
the condition that the true rain rates are only 2 to 15 mm/hr, some retrieval from R4 are even 
greater than 50 mm/hr. The large variation in retrieved intensity is also seen in the case of heavy 
precipitation. 

One should not conclude that a simplified physical model is always unsuitable for the use 
of a Bayesian algorithm, since considerable differences exist in the multichannel relationships 
between the retrieval target dataset and the applied physical model for experiment R4. The 
retrieval bias may be reduced if the simple physical model is improved or tuned to be closer to 
the training dataset. However, we must note that the covariance matrix facilitating the linear 
model was estimated from the same radar-radiative simulations that were used in the explicit 
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functional model. Therefore, it is clear that the simplified physical model cannot explain all of 
data behaviors, and this deficiency will lead to a significant bias in some ranges of rain rates. 

6. Applications to TMI data 

This section provides applications of our new Bayesian algorithm to TMI data, including 
implements and validations against PR data. In previous sections, we trained our algorithm with 
radar-radiative simulations. In an attempt to construct an algorithm based on calibrate 
precipitation measurements, we used the same framework here, but the prior and conditional 
PDFs were adjusted based on actual PR and TMI match-ups. 

a. PR-TMI match-up data 

Since there is no long-term/wide-area dense rain gauge network or other reliable data to 
provide true rainfall intensity over the ocean, we used near-surface rain rates of TRMM standard 
product 2A25 (retrieval from the Precipitation Radar (PR)) to develop and validate our TMI 
Bayesian retrieval algorithm. The 2A25 PR products provide retrieved rain rates with 4-km 
resolution. Since our retrievals represented rain intensity at a 15-km resolution, PR surface rain 
rates were averaged using a Gaussian weighting function and interpolated to the locations of 
TMI pixels. For convenience, the averaged rain rate estimates are hereafter referred to as PR 
rain rates. Note that PR reflectivity measurements can suffer from sidelobe contamination when 
the main beam of the PR is off-nadir, and the resulting errors would propagate into rain rate 
estimates. To minimize the likely impact of these errors, only match-ups between TMI 
observations and near-nadir PR estimates are utilized. 
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The probability distribution of near-nadir 15x15 km PR rain rates from data of January, 

April, July, and October 1998 was found to be comparable with (20). Due to the minimum 
detectable threshold (~17 dBZ) of PR reflectivity, we applied a cut-off of 0.04 mm/hr to yield 
parameters (|i, o) of (-2.8, 2.0). In addition, we used two months PR-TMI match-ups (January 

and July, 1998) to adjust conditional PDFs of our Bayesian algorithms. Detailed 
parameterizations are given in Chiu (2003). 

b. Validation datasets 

We used a number of cases to evaluate overall performances for all algorithms. In 
addition to looking at quasi-global performance, we also considered specific classes of 
precipitating systems. These included tropical cyclones, frontal rain bands in extra-tropical 
cyclones, and some scattered strong convection cells. The purpose behind the selection of these 
cases is to evaluate the degree to which algorithm performance depends on the nature of the 
precipitation cloud system under consideration, especially when the latter is in some sense 
atypical of global precipitation. 

1) Individual test cases 

A typhoon case from TMI orbit number 336 as well as twelve oceanic cases from Bauer 
et al. (2001) were selected as individual test cases for retrieved rain rates. As before, the “true" 
rain rate in the validation dataset is obtained from the coincident PR 15x15 km averaged rain 
rate. 

2) 1998/04 PR-TMI GLOBAL MATCH-UP DATA 

For quasi-global validation purposes, 1 18 orbit files in April 1998 were randomly 
selected to validate the overall performance of all algorithms. Note that our Bayesian retrieval 
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algorithm was derived from the data of January and July in 1998, and therefore, the PR-TMI 
match-up data of April is independent for the purposes of this validation. 


c. Benchmark algorithms 

Two benchmark algorithms are utilized in this study to provide intercomparisons for the 
purpose of validations. One benchmark algorithm is the Goddard Profiling algorithm (GPROF), 
which is the official algorithm for TM1 data (Kummerow et al. 1996, 2001; Olson et al. 1996, 
2005). This algorithm introduced a database to represent the presumed probability distribution 
of rain rate and cloud profiles. Cloud profiles with microwave signatures that are close to 
satellite observations are picked from the database as candidates. The selected candidates are 
then averaged to yield the best surface rain rate and precipitation structure for each pixel, based 
on the relative occurrence of each cloud profile in the database. 

The other benchmark algorithm is a linear regression model developed from PR-TMI 

match-up data in January and July 1998. Regression variables include the P\o, P 19, P37, and 
scatter index at 37 and 85 GHz (S37, and S% 5 , Petty 1994a). The linear model is formulated as 
follows: 
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where R is retrieved rain rate in mm/hr, and the asterisk symbol represents transformed variables. 
In this model, all pixels were retrieved without rain screening. Therefore, some constraints were 
needed for calculated rain rate fields in order to exclude unrealistic precipitation. It is assumed 
that the pixel is not rainy when its retrieved rain intensity is smaller than 0.5 mm/hr. 

The linear algorithm may be regarded as the least sophisticated of the algorithms, in that 
it does not account for either nonlinearity in the relationship between R and the microwave 
observables or the highly non-Gaussian distributions of these variables. Hence, differences in 
the performance of this algorithm from the other algorithms may be regarded as a measure of the 
relative importance of these characteristics. 

d. Validation metrics 

There are several common validation statistics employed to characterize differences 
between retrieved estimates and true values. Of these, the difference between the means (or 
mean bias, henceforth BIAS), the root-mean-squared difference (henceforth RMSD), and the 
linear correlation coefficient are the most commonly used. It bears emphasizing that no 
performance statistic for any single algorithm is meaningful when considered in isolation, as it 
invariable depends on the statistical and physical properties of the validation data set. We can 
therefore assess the "goodness" of any outcome only by comparing the results of several 
competing algorithms applied to identical data sets. 

The linear correlation coefficient represents the strength of the linear relationship 
between observations and retrievals. Unlike either BIAS or RMSD, it is not affected by linear 
systematic errors in either the validation data or the retrievals and it is therefore a better measure 
of intrinsic (or potential) algorithm performance at discriminating between high and low rain 
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rates. However, a systematic non-linear bias, though potentially correctible, will adversely affect 
the correlation coefficient and therefore give a misleading indication of intrinsic algorithm skill. 

A drawback to any of the traditional metrics is that they do not distinguish between 
performances at high and low rain rate values; moreover, they do not necessarily distinguish 
between systematic errors due to correctable nonlinearities in an algorithm's response and 
retrieval errors of a more random (and therefore non-correctable) nature. We therefore also 
employ an adaptation of the Heidke Skill Score (HSS) proposed by Conner and Petty (1998). 
They specified separate rain rate thresholds R v and R, for the validation data and for the 
retrievals, respectively, and to allow these to vary independently of one another. The HSS may 
then be computed and plotted as a continuous bivariate function of the two thresholds. One may 
then identify those combinations of the two thresholds that maximize the skill score. The 
maximum skill found for any given value ofR v is independent of any calibration bias (linear or 
non-linear) in the retrievals. This method therefore offers a way of characterizing the intrinsic 
performance of an algorithm at discriminating between high and low (or zero) rain rates, 
independent of any biases, while also allowing the presence of any such biases to be inferred via 
the relationship between R v and R r that maximizes the skill score. 

e. Results 

The performance of all algorithms as measured using conventional validation statistics 
(bias, root-mean-squared error, and correlation coefficient) are summarized in Table 4. For the 
purposes of this study, the outcome is deemed satisfactory if the performance of our Bayesian 
algorithm is found to be comparable to that of other algorithms, given that ours has the added 
advantage of producing posterior PDFs of surface rain rates. 
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1) Typhoon case 

Table 4 shows that the Bayesian-MEAN has a zero bias, a comparable correlation 
coefficient with GPROF and the linear model, and a slightly larger RMSD for TMI orbit 336. 

To obtain a direct sense of how different the retrieval from each algorithm behaves, PR and 
algorithm-retrieved rain rates are mapped in Fig. 8. Qualitatively, all algorithms are able to 
retrieve the eye, two separate rain bands, and the overall cyclonic structure of the typhoon. 
Quantitatively, most algorithms produce rain rate intensities of similar magnitude to the PR 
interpolated data. A scatter plot of retrieval vs. PR rain rate (Fig. 9) also demonstrates good 
agreements for all algorithms, though underestimations are seen when rain rates exceed 30 
mm/hr. In addition, the dynamic retrieval range in the Bayesian-MLE algorithm is around 10 
mm/hr in this case, which is much smaller than that derived from the posterior mean. This 
deficiency leads to a significant negative bias in Bayesian-MLE retrievals (Table 4). 

Figure 10 illustrates posterior PDFs for points A-D that are marked in Fig 8d. Point A is 

located around the eye of the typhoon, and has a narrow spectrum and distinct peak at a smaller 
rain rate. Points B and C are on the different sides of the wall. Their PDFs show that Point B 
has a heavier tail and thus yields a much larger mean rain rate than Point C. Point D is located in 
the one of spiral rain bands. Its associated posterior PDF has a heavy tail as well, and highly 
skewed to the right. 

2) BAUER’S 12 OCEANIC CASES 

For these twelve oceanic cases, retrievals from the Bayesian-MEAN algorithm associate 
with a bias of only 0.08 mm/hr (Table 4), but have a larger root-mean-squared error and a 
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smaller correlation coefficient, implying the lack of retrieval ability for heavy rain rates. The 
linear model has zero bias and the largest correlation coefficient here. The Bayesian-MLE has 
comparable performance with GPORF for these cases. 

Figure 1 1 shows contours of 2-dimensional HSS of each algorithm, and indicates that the 
highest skill scores occur in the range of 0-10 mm/hr for all algorithms. For any given validation 
rain rate threshold R v , the value of the algorithm rain rate threshold R r can be identified for which 
the HSS is maximized. We may refer to this as the optimized rain rate threshold R opt for that 
value of R v . Two additional useful types of plots follow from this definition. One is a plot of the 
/? 0 p, vs. R v , which is indicative of apparent bias, relative to the validation data. The other is the 
plot of maximum HSS (i.e., the skill score computed at R 0 pt ) vs. R v , which represents the intrinsic 
(bias-independent) discrimination ability of the algorithm with respect to rain rates exceeding R v . 

Figure 12 depicts the relationship between ^ opl and /? v for the ensemble of twelve 
overpasses. The linear relations revealed in plots demonstrate that most algorithms, especially 
the Bayesian-MEAN model, offer good agreement with PR data when true rain rates is less than 
20 mm/hr. The degradation at higher rain rate in our Bayesian algorithm originates from 
insufficient data samples at higher rain rate, and the resulting poor fit to the data when we 
specified our conditional likelihoods. Fig. 13 shows the maximum Heidke skill score vs. R v for 
this data set, and all algorithms have comparable discrimination abilities. Since the proportion of 
number of hits to the total data points in the contingency table for all algorithms drops 
dramatically (to less than 1%), the maximum HSS might not be statistically meaningful when 

true rain rate is above 10 mm/hr. Therefore, we only show the maximum HSS for rain rates up to 
1 0 mm/hr. 
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3) April 1998 

In the randomly selected cases in April 1998, our Bayesian retrieval algorithms show the 
smallest bias and a comparable correlation coefficient with that of GPROF. The linear model 
has the best correlation with PR, but is associated with a negative bias. HSS analysis reveals 
satisfactory performance of our Bayesian algorithms (figures not shown). 

7. Summary and discussions 

This paper has introduced a new Bayesian retrieval algorithm that provides continuous 
posterior probability distributions of rain rate from satellite microwave observations. The 
generic forms of this algorithm were comprised of explicit, closed-form functions, and based on 
simulations using high-resolution radar composites and ID plane-parallel radiative transfer 
model. From the derived posterior distributions, various statistical estimators, such as the mean 
and maximum likelihood estimate, can be used to serve as the single-pixel retrieved rain rate, 
depending on the desired characteristics of the retrieval purpose. 

We used synthetic randomly-generated datasets to clarify the theoretical advantage of the 
Bayesian algorithm, as well as to demonstrate its retrieval ability when imperfect information 
was applied to the algorithm, which is often the case in reality. We should be aware of the 
inherent retrieval uncertainty associated with certain scenes, especially in connection with 
variable beam-filling effects. In addition, a significant low bias at higher rain rate is found even 
when the prior and conditional likelihoods are perfectly modeled. This bias is attributed to the 
loss of physically direct information concerning rain rate due to saturation of microwave 
observables. In such cases, the prior PDF supplies most of the information to the posterior PDF 
and favors the more frequent lesser rain rates. 
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The sensitivity tests revealed that retrieved surface rain rate is sensitive to assumptions in 
the prior rain rate distribution. The effect of the prior distribution on retrieval is different at 
various rain rate ranges, mainly determined by the characteristics of the prior PDF. We also 
found that a simple covariance matrix is not sufficient to describe statistical and physical 
dependency of microwave measurements on rain rate. This insufficiency can lead to substantial 
errors and bias in retrieval. On the other hand, a use of explicit functional models can provide 
more accurate and complete relationships without increasing computational loading. Once these 
explicit, closed-form functions are well fitted to the training dataset, the Bayesian algorithm is 
not very sensitive to the slight change in the parameterizations. 

Applications to TMI data demonstrate that the performance of our Bayesian algorithm is 
comparable with that of GPROF and a new linear model, while ours also provides complete 
posterior rain rate probability distributions. In general, retrievals from the Bayesian-MEAN 
algorithm have very small biases and good linear correlations with PR-derived rain rates. The 
Bayesian-MLE algorithm revealed an excellent ability to retrieve light rain intensity (shown in 
sensitivity tests as well). However, the maximum-likelihood rain rate is generally much less than 
the expectation value obtained from the same posterior PDF, especially for higher rain rates. If 
unbiased averages over time and/or space are required, then the Bayesian-MEAN algorithm is 
the preferred choice. But if more representative rain rates in areas of light rain are required in 
instantaneous "snapshots" of precipitating systems, the Bayesian-MLE result might be preferred. 

We also found that our Bayesian algorithms had greater difficulty retrieving the heaviest 
rain rates. Large errors in retrieving heavy rain rates, even though the latter are relatively 
infrequent, can have large effects on computed root-mean-square errors and correlation 
coefficients. The inability of all algorithms to achieve unbiased results in heavy rainfall is partly 
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Table 1 . Parameters a and b in the approximation of liquid water extinction coefficient. 

Table 2. Coefficients of a M , b n , and c n . 

Table 3. Information of designed experiments in sensitivity tests, including the experiment ID, 
the training dataset, and the specifications of the prior and conditional likelihoods applied to the 
Bayesian algorithm. 

Table 4. Bias, root-mean-squared difference(RMSD), and correlation coefficients (Corr) for 
each algorithm and each validation dataset. The unit of bias and RMSD is mm/hr. 
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Table 1. Parameters a and b in the approximation of liquid water extinction coefficient. 


Channel (GHz) 

K e,i (m 2 /kg) 

a 

b 

10.65 

0.0244 

0.002956 

1.18759 

19.35 

0.0785 

0.01585 

1.09403 

37.00 

0.261 

0.06896 

1.01876 

85.50 

0.932 

0.2799 

0.84693 
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Table 2. Coefficients of a„, b > M , and c M . 




Channel (GHz) 

a n 

K 

C M 

10.65 

0.75 

0.03 

0.30 

19.35 

1.35 

0.05 

-0.30 

37.00 

1.55 

0.10 

-0.50 
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Table 3. Information of designed experiments in sensitivity tests, including the experiment ID, 
the training dataset, and the specifications of the prior and conditional likelihoods applied to the 
Bayesian algorithm. 


Experiment ID 

n(R) 

Physical model 

RO 

logN(0,2) 

Nonlinear model from (17) - (19) 

R1 

logN(0,l) 

Nonlinear model from (17) - (19) 

R2 

Uniform [0, 100] 

Nonlinear model from (17) — (19) 

R3 

logN(0,2) 

Different parameterizations in (19) 

R4 

logN(0,2) 

Linear model from ( 1 3) — ( 1 6) 
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Table 4. Bias, root-mean-squared difference (RMSD), and correlation coefficients (Corr) for 
each algorithm and each validation dataset. The unit of bias and RMSD is mm/hr. 



Bias 

Orbit 336 
RMSD 

Corr 

Bauer’s cases 
Bias RMSD Corr 

Bias 

April 1998 
RMSD 

Corr 

GPROF 

-0.22 

2.68 

0.88 

-0.22 

1.64 

0.76 

-0.21 

1.18 

0.78 

Linear model 

-0.29 

2.71 

0.88 

0.00 

1.58 

0.78 

-0.08 

1.09 

0.81 

Bayesian-MEAN 

0.00 

2.89 

0.85 

0.08 

1.84 

0.71 

0.03 

1.26 

0.78 

Bayesian-MLE 

-1.04 

3.75 

0.85 

-0.29 

1.80 

0.74 

-0.01 

1.21 

0.75 


/ 


Figure Captions 

Fig. 1 . Contours of the number of pixels based on TMI data (the first and the third columns). 
Contours are logarithmically spaced; actual value is 10\ where x is the contour label, x are 
plotted for values of [0.5, 1, 2, 3, 4,5], 

Fig. 2. Examples of derived posterior rain rate distributions at some given P vectors in 
experiment R0. The observation vector (Pio, P\% P37) is presented by the three numbers in 
parentheses. 

Fig. 3. Histograms of retrievals at different rain rate ranges for R0 experiment. Titles contain 
information about range of true values RR, sample size, and the estimator. Numbers in 
parentheses are the mean and standard deviation of the histogram. Percentages are the fractions 
of retrieved rain rates located in the correct range. 

Fig. 4. Same as Fig. 3, but using MLE estimations. 

Fig. 5. Retrieval histogram of experiment R1 at rain rate ranges of [2, 4], [15, 30], and [50, 75] 
mm/hr for MEAN (upper panel) and MLE (bottom panel). 

Fig. 6. Joint PDFs of the P vector for experiment R0 (upper panel) and R4 (bottom panel). 
Contours are plotted for [0.05, 0.5, 1.0, 2.5, 5.0, 7.5, 10]. 
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Fig. 7. Same as Fig. 3, but for experiment R4. 


% 


Fig. 8. (a) PR interpolated rain rates with a 15-km resolution for TRMM/TMI orbit 336. 
Retrieved rain rate (mm/hr) from (b) GPROF, (c) the linear model algorithm, (d) Bayesian- 
MEAN, and (e) Bayesian-MLE models for TRMM/TMI orbit 336. 

Fig. 9. Scatter plot of retrieved rain rate vs. PR rain rate for all algorithms for TMI orbit 336. 

Fig. 10. Posterior PDFs of retrieved rain rates at locations A, B, C, and D that are marked in Fig. 
8(d). 

Fig. 1 1. 2-D distribution of Heidke skill scores (HSS) for the 12 selected cases from the Bauer et 
al. (2001). Retrieval algorithms are: (a) GPROF, (b) the linear model, (c) Bayesian-MEAN, and 
(d) Bayesian-MLE. The value noted in the bottom-right comer of each plot indicates the highest 
HSS of the algorithm. Contours are plotted with an interval of 0. 1 . 

Fig. 12. Plots of the best algorithm rain rate threshold with respect to the threshold of PR rain 
rate for the Bauer’s cases with (a) GPROF, (b) the linear model, (c) Bayesian-MEAN, and (d) 
Bayesian-MLE. 

Fig. 13. Plots of the maximum Heidke skill score vs. the PR rain rate threshold in the range of 
[0, 10] mm/hr for Bauer’s cases. 
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Fig. 1. Contours of the number of pixels based on TMI data (the first and the third columns). 
Contours are logarithmically spaced; actual value is 10 x , where x is the contour label, x are 
plotted for values of [0.5, 1, 2, 3, 4,5]. 
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Fig. 2. Examples of derived posterior rain rate distributions at some given P vectors in 
experiment RO. The observation vector (P\ 0 , P\ 9 , Pii) is presented by the three numbers in 
parentheses. 
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Fig. 3. Histograms of retrievals at different rain rate ranges for RO experiment. Titles contain 
information about range of true values RR, sample size, and the estimator. Numbers in 
parentheses are the mean and standard deviation of the histogram. Percentages are the fractions 
of retrieved rain rates located in the correct range. 
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Fig. 5. Retrieval histogram of experiment R1 at rain rate ranges of [2, 4], [15, 30], and [50, 75] 
mm/hr for MEAN (upper panel) and MLE (bottom panel). 
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Fig. 7. Same as Fig. 3, but for experiment R4. 
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Fig. 1 1. 2-D distribution of Heidke skill scores (HSS) for the 12 selected cases from the Bauer et 
al. (2001). Retrieval algorithms are: (a) GPROF, (b) the linear model, (c) Bayesian- MEAN, and 
(d) Bayesian-MLE. The value noted in the bottom-right comer of each plot indicates the highest 
F1SS of the algorithm. Contours are plotted with an interval of 0. 1 . 
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Fig. 1 2. Plots of the best algorithm rain rate threshold with respect to the threshold of PR rain 
rate for the Bauer’s cases with (a) GPROF, (b) the linear model, (c) Bayesian-MEAN, and (d) 
Bayesian-MLE. 
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Fig. 13. Plots of the maximum Heidke skill score vs. the PR rain rate threshold in the range of 


[0, 10] mm/hr for Bauer’s cases. 






Fig. 8. (a) PR interpolated rain rates with a 15-km resolution for TRMM/TM1 orbit 336. 
Retrieved rain rate (mm/hr) from (b) GPROF, (c) the linear model algorithm, (d) Bayesian- 
MEAN, and (e) Bayesian-MLE models for TRMM/TMI orbit 336. 
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Fig. 9. Scatter plot of retrieved rain rate vs. PR rain rate for all algorithms for TM1 orbit 336. 
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"Bayesian Retrieval of Complete Posterior PDFs of Oceanic Rain Rate From 

Microwave Observations" 

J. Christine Chiu (JCET) and Grant W. Petty (Univ. of \ ¥aohingto fl) 
accepted by J. Applied Meteorology U/iscertC'n 

Popular Summary 

Precipitation is an essential element in the quantitative understanding of the global 
hydrological cycle. Global surface rainfall is now widely estimated from satellite passive 
microwave observations. Inference of surface rainfall from microwave radiation would 
be less troublesome if the relationship between these two variables were unique and 
reasonably linear. Unfortunately, not only is the relationship non-monotonic owing to the 
competing effects of scattering and emission, but a variety of microphysical and 
environmental factors introduces significant variability into the relationship. 

Bayes’ theorem offers a rigorous and completely general theoretical framework for 
retrieving atmospheric variables from remote sensing measurements. We have fully 
exploited this theorem, and introduced a new Bayesian algorithm for retrieving surface 
rain rate from Tropical Rainfall Measurements Mission (TRMM) Microwave Imager 
over the ocean. While other rain rate algorithms have been published that are based at 
least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm 
that yields not just a single “best” estimate of rain rate, but rather a continuous posterior 
probability distribution of rain rate at any location. 

We used synthetic datasets to clarify the theoretical advantage of the Bayesian algorithm, 
as well as to demonstrate its retrieval ability when imperfect information was applied to 
the algorithm, which is often the case in reality. We have found that even when perfect 
knowledge is used, biased retrievals may occur at high rain rates. This bias is not the 
result of a defect of the Bayesian formalism but rather represents the expected outcome 
when the physical constraint imposed by the radiometric observations is weak, due to 
saturation effects. 

For real-world applications, we validated rain rate retrievals against estimates from the 
TRMM Precipitation Radar. The performance of our Bayesian algorithm is found to be 
comparable to that of other benchmark algorithms, while having the additional advantage 
of providing a complete continuous posterior probability distribution of surface rain rate. 



